Skip to content

src: add Latin1 fast path in StringBytes::Encode utf8#63385

Open
mertcanaltin wants to merge 1 commit into
nodejs:mainfrom
mertcanaltin:mert/buffer-tostring-utf8-latin1
Open

src: add Latin1 fast path in StringBytes::Encode utf8#63385
mertcanaltin wants to merge 1 commit into
nodejs:mainfrom
mertcanaltin:mert/buffer-tostring-utf8-latin1

Conversation

@mertcanaltin
Copy link
Copy Markdown
Member

@mertcanaltin mertcanaltin commented May 17, 2026

In StringBytes::Encode utf8, latin1-fits content was going through the UTF-16 path. I added a latin1 fast path that converts via simdutf and returns a one-byte V8 string. Benefits every Buffer.toString('utf8'), including fs.readFile/promises.readFile when they delegate to it.

@nodejs/performance @mcollina @anonrig @lemire @addaleax

Benchmark results:

➜  node git:(mert/buffer-tostring-utf8-latin1) ✗ node-benchmark-compare ./buffer-result.csv
                                                                                      confidence improvement accuracy (*)    (**)   (***)
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=1024                              0.34 %       ±1.92%  ±2.56%  ±3.35%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=16384                            -0.57 %       ±1.01%  ±1.34%  ±1.74%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=262144                            0.80 %       ±1.13%  ±1.51%  ±1.96%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=4194304                    *      0.52 %       ±0.46%  ±0.61%  ±0.80%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='ascii' size=64                                9.85 %      ±30.50% ±40.59% ±52.85%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1_then_cjk' size=1024           ***      5.25 %       ±1.60%  ±2.13%  ±2.78%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1_then_cjk' size=16384          ***     24.32 %       ±0.71%  ±0.95%  ±1.24%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1_then_cjk' size=262144         ***     24.33 %       ±0.86%  ±1.15%  ±1.51%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1_then_cjk' size=4194304                -0.06 %       ±0.53%  ±0.71%  ±0.92%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1_then_cjk' size=64                     -2.30 %       ±2.95%  ±3.93%  ±5.13%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=1024                    ***      9.02 %       ±3.85%  ±5.17%  ±6.82%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=16384                   ***     34.52 %       ±1.42%  ±1.90%  ±2.50%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=262144                  ***     30.06 %       ±1.48%  ±1.98%  ±2.62%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=4194304                          0.09 %       ±0.42%  ±0.56%  ±0.73%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='latin1' size=64                        *      6.60 %       ±5.00%  ±6.70%  ±8.82%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=1024                ***      6.47 %       ±1.88%  ±2.50%  ±3.27%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=16384               ***     33.69 %       ±1.62%  ±2.16%  ±2.82%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=262144              ***     47.29 %       ±1.09%  ±1.45%  ±1.89%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=4194304                     -0.31 %       ±0.61%  ±0.82%  ±1.07%
buffers/buffer-tostring-utf8-latin1.js n=10000 content='utf8_mixed' size=64                    *     -2.47 %       ±2.25%  ±2.99%  ±3.89%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 20 comparisons, you can thus expect the following amount of false-positive results:
  1.00 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.20 false positives, when considering a   1% risk acceptance (**, ***),
  0.02 false positives, when considering a 0.1% risk acceptance (***)
➜  node git:(mert/buffer-tostring-utf8-latin1) ✗

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/performance

@nodejs-github-bot nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels May 17, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

❌ Patch coverage is 68.25397% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.05%. Comparing base (265679b) to head (73d9df0).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
src/string_bytes.cc 68.25% 14 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #63385      +/-   ##
==========================================
- Coverage   90.05%   90.05%   -0.01%     
==========================================
  Files         714      714              
  Lines      225628   225795     +167     
  Branches    42673    42739      +66     
==========================================
+ Hits       203198   203332     +134     
- Misses      14225    14240      +15     
- Partials     8205     8223      +18     
Files with missing lines Coverage Δ
src/string_bytes.cc 73.21% <68.25%> (-1.42%) ⬇️

... and 53 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mertcanaltin
Copy link
Copy Markdown
Member Author

Also covers @addaleax's review on #63370, fs picks it up through the shared StringBytes path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants